Lingthusiasm Vowel Plots
  • Tutorial
    • Part 1: Finding Vowels to Plot
    • Part 2: Annotating the Audio
    • Part 3: Plotting the Vowels
  • Episode
    • Coming in March 2024!
  • Downloads
  • Source Code
  • Ask a Question

Contents

  • 3.1 Setup
  • 3.2 Plot Vowel Means
  • 3.3 Plot Individual Data Points
  • 3.4 Plot Word Means

View source

Part 3: Plotting the Vowels

Author

Bethany Gardner

Published

January 30, 2024


Now to actually make the vowel plots! This document goes into detail about how I decided to make them the way I did and how to implement them in ggplot, but if you just want to see the final results, jump down to here.

3.1 Setup

library(tidyverse)
library(magrittr)
library(ggtext)
library(ggforce)
library(ggrepel)
library(rcartocolor)
library(png)
library(patchwork)

options(dplyr.summarise.inform = FALSE)
1
Data wrangling (tidyr, dplyr, purrr, stringr), ggplot2 for plotting.
2
Pipe operator.
3
Markdown/HTML formatting for text in plots.
4
Ellipsis plots.
5
Offset text labels from points.
6
Color themes.
7
Open PNG images.
8
Add images on top of plots.
9
Don’t print a message every time summarise() is called on a grouped dataframe.

(Note: this could be done in Python, but I strongly prefer the ggplot package for plotting.)

Data

Load the vowel formant data from Part 2:

formants <- read.csv("data/formants.csv", stringsAsFactors = TRUE) %>%
  select(-Vowel_Time, -Count) %>%
  mutate(
    Speaker = ifelse(Speaker == "G", "Gretchen", "Lauren"),
    List = ifelse(
      List == "episode", "Lingthusiasm Episodes", "Wells Lexical Set"
    )
  ) %>%
  mutate(across(where(is.character), as.factor))

str(formants)
1
Read formants data from 2_annotate_audio.qmd, and keep columns for List, Vowel, Word, Speaker, F1, and F2.
2
Recode the values for Speaker and List from abbreviations to full strings for plot labels, then make them both factors.
'data.frame':   397 obs. of  6 variables:
 $ List   : Factor w/ 2 levels "Lingthusiasm Episodes",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Vowel  : Factor w/ 11 levels "ɑ","æ","ɔ","ə",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ Word   : Factor w/ 47 levels "among","another",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Speaker: Factor w/ 2 levels "Gretchen","Lauren": 1 1 1 1 1 2 2 2 2 2 ...
 $ F1     : num  766 602 733 766 626 ...
 $ F2     : num  1260 1254 1175 1373 1260 ...

IPA Symbols

However, the IPA symbols aren’t encoded correctly. They’ll render in RStudio, but not when Quarto renders the document to HTML, or always when ggplot renders the plots. This isn’t what we want:

ɑ, æ, ɔ, ə, ɛ, i, ɪ, o, u, ʊ, ʌ

So, the next step is to enter the unicode values manually (copied from this Wikipedia page):

vowels <- c(
  "i_lower"   = "\u0069",  # i (close front unrounded)
  "i_upper"   = "\u026A",  # ɪ (near-close front unrounded)
  "epsilon"   = "\u025B",  # ɛ (open-mid front unrounded)
  "ash"       = "\u00E6",  # æ (near-open front unrounded)
  "schwa"     = "\u0259",  # ə (mid central)
  "horseshoe" = "\u028A",  # ʊ (near-close near-back rounded)
  "u"         = "\u0075",  # u (close back rounded)
  "o"         = "\u006F",  # o (close-mid back rounded)
  "hat"       = "\u028C",  # ʌ (open-mid back unrounded)
  "open_o"    = "\u0254",  # ɔ (open-mid back rounded)
  "alpha"     = "\u0251"   # ɑ (open back unrounded)
)

These are ordered from front to back, then close to open (Figure 4).

Then match the unicode for the IPA symbol to the words:

formants %<>% mutate(
  Vowel = case_when(
    Word %in% c("ball", "father", "honorific", "lot", "palm", "start") ~ vowels["alpha"],
    Word %in% c("bang", "bath", "hand", "laugh", "trap") ~ vowels["ash"],
    Word %in% c("bought", "cloth", "core", "north", "thought", "wrong") ~ vowels["open_o"],
    Word %in% c("among", "famous", "support") ~ vowels["schwa"],
    Word %in% c("bet", "dress", "guest", "says", "square") ~ vowels["epsilon"],
    Word %in% c("beat", "believe", "fleece", "people") ~ vowels["i_lower"],
    Word %in% c("bit", "finish", "kit", "near", "pin") ~ vowels["i_upper"],
    Word %in% c("force", "goat") ~ vowels["o"],
    Word %in% c("blue", "goose", "through", "who") ~ vowels["u"],
    Word %in% c("could", "cure", "put", "foot") ~ vowels["horseshoe"],
    Word %in% c("another", "but", "fun", "strut") ~ vowels["hat"],
  ) %>% factor(levels = vowels, ordered = TRUE)
)

str(formants)
1
If the value in the Word column is ball, father, honorific, lot, or palm, then assign the alpha value from the vowels list.
2
Convert character to factor, then specify the order of the factors (same as in vowels list above) to make sure it stays consistent.
'data.frame':   397 obs. of  6 variables:
 $ List   : Factor w/ 2 levels "Lingthusiasm Episodes",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Vowel  : Ord.factor w/ 11 levels "i"<"ɪ"<"ɛ"<"æ"<..: 5 5 5 5 5 5 5 5 5 5 ...
  ..- attr(*, "names")= chr [1:397] "schwa" "schwa" "schwa" "schwa" ...
 $ Word   : Factor w/ 47 levels "among","another",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Speaker: Factor w/ 2 levels "Gretchen","Lauren": 1 1 1 1 1 2 2 2 2 2 ...
 $ F1     : num  766 602 733 766 626 ...
 $ F2     : num  1260 1254 1175 1373 1260 ...

Now the IPA vowels consistently render correctly:

i, ɪ, ɛ, æ, ə, ʊ, u, o, ʌ, ɔ, ɑ

Lingthusiasm Theme

The Lingthusiasm font is Josefin Sans, which is available from Google Fonts.

I downloaded and installed it to my computer. There are a number of different ways to add new fonts without having to install them separately outside of RStudio, such as font_add_google() from the showtext package. However, that method was causing errors rendering the IPA symbols.

systemfonts() shows the list of fonts installed on my computer that R recognizes, and it finds Josefin Sans:

systemfonts::system_fonts() %>%
  filter(str_detect(family, "Josefin Sans")) %>%
  select(path, name, family) %>%
  pivot_longer(cols = everything())
1
Get dataframe of fonts available.
2
Filter to include Josefin Sans.
3
Select columns to print; flip to list vertically.
# A tibble: 3 × 2
  name   value                                                                  
  <chr>  <chr>                                                                  
1 path   "C:\\Users\\betha\\AppData\\Local\\Microsoft\\Windows\\Fonts\\JosefinS…
2 name   "JosefinSans-Thin"                                                     
3 family "Josefin Sans"                                                         

However, the fonts loaded by default just include Times New Roman, Arial, and Courier New:

windowsFonts()
$serif
[1] "TT Times New Roman"

$sans
[1] "TT Arial"

$mono
[1] "TT Courier New"

This tells R to load Josefin Sans into the set of available fonts, so text will render in Josefin Sans if family = sans_alt, but stick with the default sans font otherwise (and not break the IPA symbols).

windowsFonts(sans_alt = "Josefin Sans")
windowsFonts()
$serif
[1] "TT Times New Roman"

$sans
[1] "TT Arial"

$mono
[1] "TT Courier New"

$sans_alt
[1] "Josefin Sans"

The hex codes for the green and navy are:

lingthusiasm_green = "#26b14c"
lingthusiasm_navy = "#051458"

And the logo:

lingthusiasm_logo <- readPNG("resources/lingthusiasm_logo_circle.png",  native = TRUE)
lingthusiasm_tagline <- readPNG("resources/lingthusiasm_logo_tagline.png",  native = TRUE)
1
Read logo images. native = TRUE specifies reading it as a raster object instead of an array, which is the format patchwork::inset_element() needs.

Putting it together:

tibble(
  "Color" = c("green", "navy"),
  "Hex" = c(lingthusiasm_green, lingthusiasm_navy),
  "Extra_Col" = c(1, 1)
) %>%
  ggplot(aes(x = Color, y = Extra_Col, fill = Hex, label = Hex)) +
  geom_tile() +
  geom_text(size = 10, color = "white") +
  scale_fill_identity() +
  theme_classic() +
  labs(title = "Lingthusiasm Theme") +
  theme(
    plot.title = element_text(
      family = "sans_alt", size = 28,
      margin = margin(t = 1, b = 1, unit = "lines"), hjust = 0.55,
      color = lingthusiasm_navy
    ),  
    axis.text = element_blank(), axis.title = element_blank(),
    axis.line = element_blank(), axis.ticks = element_blank()
  ) +
  inset_element(
    p = lingthusiasm_logo,
    left = unit(0.05, "snpc"), right = unit(0.25, "snpc"),
    top = unit(1.2, "snpc"), bottom = unit(1, "snpc")
  )
1
Color is names of the two lingthusiasm theme colors.
2
Hex is the two hex codes.
3
Extra_Col is a dummy value because ggplot needs a Y axis.
4
The X axis is Color, and the Y axis is Extra_Col, which just creates two boxes next to each other. Fill and label are specified by Hex.
5
Draw a square for each color.
6
Label the squares with the hex code strings (keeping the default font).
7
Fill the squares with the hex code color values.
8
Set the plot title text to be navy, Josefin Sans, size 28, centered with some space above and below.
9
Remove the axis lines, labels, titles, and ticks.
10
Use the patchwork package to add the logo image on top of the plot. This step needs to be last.
11
Specify the positions for each corner of the logo, using spnc units so it stays square even if the overall plot is rectangular.

Figure 1: Lingthusiasm colors, font, and logo.

3.2 Plot Vowel Means

Now let’s take a look at the data! F1 gets plotted on the Y axis, and F2 gets plotted on the X axis.

means_1 <- formants %>%
  group_by(Speaker, List, Vowel) %>%
  summarise(F1 = mean(F1), F2 = mean(F2)) %>%
  ggplot(aes(x = F2, y = F1, label = Vowel)) +
  geom_textbox(
    fill = lingthusiasm_green, box.colour = NA,
    color = "white", size = 4.5, halign = 0.5, valign = 0.5,
    width = unit(0.10, "snpc"), height = unit(0.10, "snpc"),
    box.padding = unit(c(0, 0, 0, 0), "snpc"), box.r = unit(0.01, "snpc")
  ) +  
  facet_grid(Speaker ~ List) +
  theme_classic() +
  theme(
    axis.line = element_line(color = lingthusiasm_navy),
    axis.ticks = element_line(color = lingthusiasm_navy),
    panel.border = element_rect(color = lingthusiasm_navy, fill = NA),
    strip.background = element_rect(color = lingthusiasm_navy),
    text = element_text(size = 12, family = "sans_alt", color = lingthusiasm_navy),
    axis.text = element_text(color = lingthusiasm_navy),
    strip.text = element_text(color = lingthusiasm_navy, size = 12)
  ) +
  labs(title = "Vowel Means")

means_1
1
Take the full data set, group it by Speaker then List then Vowel, and then calculate the means of F1 and F2 for each Speaker x List x Vowel.
2
All layers of the plot have F2 on the X axis, F1 on the Y axis, and are labelled by Vowel.
3
Write the vowel symbols (because Label = Vowel) at the location of their means.
4
Make the text box background lingthusiasm green with no outline.
5
Make the text white, size 4.5 (note that this is on a different scale than the rest of the text sizes specified in later theme()), vertically and horizontally centered.
6
Set the size of the text boxes, using snpc (squared normalized parent coordinates) to be relative to the size of the plot but always square.
7
No margins inside the text boxes and a slight curve on the corners.
8
Split the plot to have Gretchen’s data in the top panels and Lauren’s data in the bottom panels, and the data from the Lingthusiasm episodes in the left panels and the data from the Wells lexical set recordings in the right panels.
9
Change the default theme to have a white background with no grid lines.
10
Change all the lines (axis lines, axis ticks, outline around panels, outline around panel labels) to be the lingthusiasm navy.
11
Make all the text navy Josefin Sans. Set the base size as 12, but make the text of the speaker panel labels bigger.
12
Set the title, and leave the other axis/legend labels as their default values of “F1”, “F2”, and “Vowel.”

Figure 2: Vowel means (default axes).

(Sidenote: saving the theme specifications so we don’t have to keep retyping theme.)

lingthusiasm_theme <- theme(
  axis.line = element_line(color = lingthusiasm_navy),
  axis.ticks = element_line(color = lingthusiasm_navy),
  panel.border = element_rect(color = lingthusiasm_navy, fill = NA),
  strip.background = element_rect(color = lingthusiasm_navy),
  text = element_text(size = 12, family = "sans_alt", color = lingthusiasm_navy),
  axis.text = element_text(color = lingthusiasm_navy),
  strip.text = element_text(color = lingthusiasm_navy, size = 12)
) 

However, vowel plots typically have their axes reversed, so that the highest value of (F1, F2) is at the bottom left corner instead of the top right corner. This isn’t standard data visualization procedure, but it has a cool and useful result.

means_2 <- formants %>%
  group_by(List, Speaker, Vowel) %>%
  summarise(F1 = mean(F1), F2 = mean(F2)) %>%
  ggplot(aes(x = F2, y = F1, label = Vowel)) +
  geom_textbox(
    fill = lingthusiasm_green, box.colour = NA,
    color = "white", size = 4.5, halign = 0.5, valign = 0.5,
    width = unit(0.10, "snpc"), height = unit(0.10, "snpc"),
    box.padding = unit(c(0, 0, 0, 0), "snpc"), box.r = unit(0.01, "snpc")
  ) +  
  facet_grid(Speaker ~ List) +
  scale_x_reverse(breaks = c(1000, 1500, 2000, 2500)) +
  scale_y_reverse(limits = c(1050, 225), n.breaks = 4) +
  theme_classic() +
  lingthusiasm_theme +
  labs(title = "Vowel Means")

means_2
1
Just annotating the lines that changed from the previous chunk.
2
Add this to flip the X axis. Specify breaks because the default values aren’t even.
3
Add this to flip the Y axis. The limits (see how they’re reversed) are specified because the defaults were a bit too narrow, and like this the axis ticks/labels are spaced more evenly.
4
Add this to specify the colors and font sizes etc.

Figure 3: Vowel means (reversed axes).

Now the layout resembles the IPA vowel chart! Front vowels are on the left, and back vowels are on the right; close vowels are on the top, and open vowels are on the bottom.

Figure 4: IPA Vowel Chart.

3.3 Plot Individual Data Points

Just plotting the means for each vowel loses a lot of information, so let’s take a look at the underlying data.

Now, we’ll distinguish between vowels by color. First, make a legend that will be easier to read than the default by creating a string that prints each vowel in its corresponding color (using the ggtext package to render the HTML formatting).

vowel_key <- tibble("Vowel" = vowels, "Color" = carto_pal(12, "Bold")[1:11]) %>%
  mutate(Styled = str_c("<b style='color:", Color, "'>", Vowel, "</b>")) %>%
  pull(Styled) %>%
  str_flatten(collapse = ", ")

vowel_key %>% str_wrap(32) %>% str_view()
1
Vowel column is the list of vowels (unicode codes). Color column is the hex codes from the Bold palette in rcartocolor, the color set we’ve been using so far. (Using the first 11 values from the full palette, so the last color isn’t gray.)
2
Encase with HTML code, so that hex code becomes a color argument for the vowel character.
3
Merge into 1 string, with each value separated by a comma + space. 4.. Print, wrapping lines on each item.
[1] │ <b style='color:#7F3C8D'>i</b>,
    │ <b style='color:#11A579'>ɪ</b>,
    │ <b style='color:#3969AC'>ɛ</b>,
    │ <b style='color:#F2B701'>æ</b>,
    │ <b style='color:#E73F74'>ə</b>,
    │ <b style='color:#80BA5A'>ʊ</b>,
    │ <b style='color:#E68310'>u</b>,
    │ <b style='color:#008695'>o</b>,
    │ <b style='color:#CF1C90'>ʌ</b>,
    │ <b style='color:#F97B72'>ɔ</b>,
    │ <b style='color:#4B4B8F'>ɑ</b>

Which will render like this:

i, ɪ, ɛ, æ, ə, ʊ, u, o, ʌ, ɔ, ɑ

(Note that ggplot will throw a warning like Warning in text_info(label, fontkey, fontfamily, font, fontsize, cache): unable to translate '<U+0251>png215' to native encoding, but it renders correctly, so the warnings are turned off in those code chunks.)

points <- formants %>%
  ggplot(aes(x = F2, y = F1, color = Vowel, label = Vowel)) +
  geom_point(size = 1.5) +
  facet_grid(Speaker ~ List) +
  scale_color_manual(values = carto_pal(12, "Bold")) +
  scale_x_reverse(breaks = c(750, 1250, 1750, 2250, 2750)) +
  scale_y_reverse(breaks = c(250, 500, 750, 1000)) +
  theme_classic() +
  lingthusiasm_theme +
  theme(plot.subtitle = element_markdown(family = "sans")) +
  labs(title = "Individual Data Points", subtitle = vowel_key) +
  guides(color = guide_none())

points
1
Passing the full data set, not the means by Speaker + Vowel + List, to ggplot.
2
Instead of geom_text(), geom_point() is a layer drawing scatterplot (size making the points slightly bigger than default).
3
Use the Bold color palette from the rcartocolor package to color-code the vowels. (There are 11 vowels, but I specify 12 colors here so the grey gets skipped.)
4
Limits need to be slightly bigger than plots with vowel means, and then breaks adjusted so that that Y axis labels don’t overlap with each other between the two panels.
5
element_markdown() from ggtext will render the HTML string. Use default sans serif font because Josefin Sans doesn’t have all of the IPA symbols.
6
Add the color-coded list of vowels as a subtitle.
7
Turn the default legend off.

Figure 5: Individual data points.

3.4 Plot Word Means

The data for each vowel consists of 3 different words. How different are they? First, let’s look at the Wells Lexical Set.

These next plots use the ggrepel package to make the word labels not overlap with each other or with the scatterplot points.

words_ls <- formants %>%
  filter(List == "Wells Lexical Set") %>%
  group_by(Speaker, Vowel, Word) %>%
  summarise(F1 = mean(F1), F2 = mean(F2)) %>%
  ggplot(aes(x = F2, y = F1, color = Vowel, label = Word)) +
  geom_point(size = 1.25) +
  geom_label_repel(
    min.segment.length = 0,
    size = 4,
    force = 75,
    family = "sans_alt",
    seed = 2024
  ) +
  facet_wrap(~Speaker) +
  scale_color_manual(values = carto_pal(12, "Bold")) +
  scale_x_reverse(breaks = c(750, 1250, 1750, 2250, 2750)) +
  scale_y_reverse(breaks = c(250, 500, 750, 1000)) +
  theme_classic() +
  lingthusiasm_theme +
  theme(plot.subtitle = element_markdown(size = 15, family = "sans")) +
  labs(title = "Means By Word: Wells Lexical Set", subtitle = vowel_key) +
  guides(color = guide_none())

words_ls
1
Only include Wells Lexical Set word list.
2
The data for this plot is the means by Speaker, Vowel, AND Word.
3
All layers of this plot have F2 on the X axis, F1 on the Y axis, are color-coded by Vowel, and are labelled by Vowel.
4
Draw a point at each Speaker*Vowel*Word mean (slightly bigger than default).
5
Draw text box labels offset from each point. geom_label_repel() makes sure none of the boxes overlap with the points or with each other.
6
Always draw a line from the text box to the scatterplot point,
7
Text size. Note this is on a different scale than the text for the title/axis labels.
8
Increase the amount of space required between the text boxes.
9
Make font Josefin sans.
10
Set a seed so the results are consistent.
11
Put Gretchen on the left and Lauren on the right.
12
Specify limits and locations of labels/breaks, since the defaults aren’t even.
13
Include color-coded vowel string as a subtitle, instead of the default legend for color.

Figure 6: Mean for each word in the Wells Lexical Set word list.

Now let’s look at the Lingthusiasm episode words:

words_ep_1 <- formants %>%
  filter(List == "Lingthusiasm Episodes") %>%
  group_by(Speaker, Vowel, Word) %>%
  summarise(F1 = mean(F1), F2 = mean(F2)) %>%
  ggplot(aes(x = F2, y = F1, color = Vowel, label = Word)) +
  geom_point(size = 1.25) +
  geom_label_repel(
    min.segment.length = 0, force = 75, seed = 2024,
    size = 4, family = "sans_alt"
  ) +
  facet_wrap(~Speaker, ncol = 1) +
  scale_color_manual(values = carto_pal(12, "Bold")) +
  scale_x_reverse(breaks = c(750, 1250, 1750, 2250, 2750)) +
  scale_y_reverse(breaks = c(250, 500, 750, 1000)) +
  theme_classic() +
  lingthusiasm_theme +
  theme(plot.subtitle = element_markdown(size = 15, family = "sans")) +
  labs(title = "Means By Word: Lingthusiasm Episodes", subtitle = vowel_key) +
  guides(color = guide_none())

words_ep_1
1
Only include Lingthusiasm Episode word list.
2
The panels are stacked vertically, because the word labels take up more space. fig-asp: 1.25 in this code chunk’s header makes it render tall enough.

Figure 7: Mean for each word in the Lingthusiasm Episode word list.

One thing that makes this plot a bit hard to interpret is that it’s not immediately clear which vowel in the word is the one being plotted. So, let’s make the vowel bold relative to the rest of the word.

So far we’ve been using ggtext to format text, but that doesn’t work with ggrepel. The workaround, like with the IPA vowels, is to just enter the unicode characters directly.

These are the codes for the Mathematical Sans Serif capital letters, in regular and bold faces. They’re copy-pasted in here manually even though the pattern is predictable, because procedurally generating strings with the \u prefix is a pain.

alphabet_reg <- c(
  "\U1D5A0", "\U1D5A1", "\U1D5A2", "\U1D5A3", "\U1D5A4", "\U1D5A5",
  "\U1D5A6", "\U1D5A7", "\U1D5A8", "\U1D5A9", "\U1D5AA", "\U1D5AB",
  "\U1D5AC", "\U1D5AD", "\U1D5AE", "\U1D5AF", "\U1D5B0", "\U1D5B1",
  "\U1D5B2", "\U1D5B3", "\U1D5B4", "\U1D5B5", "\U1D5B6", "\U1D5B7",
  "\U1D5B8", "\U1D5B9"
)

alphabet_bold <- c(
  "\U1D5D4", "\U1D5D5", "\U1D5D6", "\U1D5D7", "\U1D5D8", "\U1D5D9",
  "\U1D5DA", "\U1D5DB", "\U1D5DC", "\U1D5DD", "\U1D5DE", "\U1D5DF",
  "\U1D5E0", "\U1D5E1", "\U1D5E2", "\U1D5E3", "\U1D5E4", "\U1D5E5",
  "\U1D5E6", "\U1D5E7", "\U1D5E8", "\U1D5E9", "\U1D5EA", "\U1D5EB",
  "\U1D5EC", "\U1D5ED"
)

names(alphabet_reg) <- letters[1:26]
names(alphabet_bold) <- letters[1:26]
1
Name vectors with regular letters, to access similar to a python dictionary.

First, convert the whole word to the regular letters:

to_unicode_caps <- function(word, alphabet_reg) {
  letters <- str_split(word, pattern = "")
  converted <- ""
  for (l in letters) {
    new_word <- str_c(converted, alphabet_reg[l])
  }
  return(str_flatten(new_word))
}
1
Function takes word as a string and alphabet_reg as a named list.
2
Split the word into individual letters.
3
Start string for the converted word.
4
For each letter, use the fact that alphabet_reg is named with the regular letters to get the unicode string for the current letter. Concatenate the letter pulled from alphabet_reg to the converted string.
5
Combine list of letters back into one string.
words_unicode <- map(formants$Word, to_unicode_caps, alphabet_reg)

formants %<>% mutate(.after = Word, Word_Label = words_unicode) %>%
  unnest(Word_Label)
1
For each item in the Word column of the formants dataframe, call the function to_unicode_caps() (defined in previous code chunk) on it. Pass alphabet_reg as the second argument to to_unicode_caps().
2
Insert the words_unicode into the formants dataframe as a column called Word_Label, after the Word column.
3
Convert the items in Word_Label from lists containing 1 string to just strings.

Which renders as:

𝖠𝖬𝖮𝖭𝖦, 𝖠𝖭𝖮𝖳𝖧𝖤𝖱, 𝖡𝖠𝖫𝖫, 𝖡𝖠𝖭𝖦, 𝖡𝖤𝖠𝖳, 𝖡𝖤𝖫𝖨𝖤𝖵𝖤, 𝖡𝖤𝖳, 𝖡𝖨𝖳, 𝖡𝖫𝖴𝖤, 𝖡𝖮𝖴𝖦𝖧𝖳, 𝖡𝖴𝖳, 𝖢𝖮𝖱𝖤, 𝖢𝖮𝖴𝖫𝖣, 𝖥𝖠𝖬𝖮𝖴𝖲, 𝖥𝖠𝖳𝖧𝖤𝖱, 𝖥𝖨𝖭𝖨𝖲𝖧, 𝖥𝖮𝖮𝖳, 𝖥𝖴𝖭, 𝖦𝖴𝖤𝖲𝖳, 𝖧𝖠𝖭𝖣, 𝖧𝖮𝖭𝖮𝖱𝖨𝖥𝖨𝖢, 𝖫𝖠𝖴𝖦𝖧, 𝖯𝖤𝖮𝖯𝖫𝖤, 𝖯𝖨𝖭, 𝖯𝖴𝖳, 𝖲𝖠𝖸𝖲, 𝖲𝖴𝖯𝖯𝖮𝖱𝖳, 𝖳𝖧𝖱𝖮𝖴𝖦𝖧, 𝖶𝖧𝖮, 𝖶𝖱𝖮𝖭𝖦, 𝖡𝖠𝖳𝖧, 𝖢𝖫𝖮𝖳𝖧, 𝖢𝖴𝖱𝖤, 𝖣𝖱𝖤𝖲𝖲, 𝖥𝖫𝖤𝖤𝖢𝖤, 𝖥𝖮𝖱𝖢𝖤, 𝖦𝖮𝖠𝖳, 𝖦𝖮𝖮𝖲𝖤, 𝖪𝖨𝖳, 𝖫𝖮𝖳, 𝖭𝖤𝖠𝖱, 𝖯𝖠𝖫𝖬, 𝖲𝖰𝖴𝖠𝖱𝖤, 𝖲𝖳𝖠𝖱𝖳, 𝖲𝖳𝖱𝖴𝖳, 𝖳𝖧𝖮𝖴𝖦𝖧𝖳, 𝖳𝖱𝖠𝖯

Then convert the corresponding vowels to bold face.

formants %<>% mutate(
  Word_Label = ifelse(
    Word %in% c(
      "ball", "bang", "bath", "beat", "father", "famous", "goat", "hand",
      "laugh", "near", "palm", "says", "square", "start", "trap"
    ),
    str_replace(Word_Label, alphabet_reg["a"], alphabet_bold["a"]),
    Word_Label
  ),
  Word_Label = ifelse(
    Word %in% c(
      "beat", "bet", "blue", "dress", "fleece", "guest", "near", "people"
    ),
    str_replace(Word_Label, alphabet_reg["e"], alphabet_bold["e"]),
    Word_Label
  ),
  Word_Label = ifelse(
    Word %in% c("bit", "finish", "kit", "pin"),
    str_replace(Word_Label, alphabet_reg["i"], alphabet_bold["i"]),
    Word_Label
  ),
  Word_Label = ifelse(
    Word %in% c(
      "among", "another", "bought", "cloth", "core", "could",
      "force", "goat", "honorific", "lot", "people", "thought",
      "through", "who", "wrong"
    ),
    str_replace(Word_Label, alphabet_reg["o"], alphabet_bold["o"]),
    Word_Label
  ),
  Word_Label = ifelse(
    Word %in% c(
      "bought", "blue", "but", "could", "fun", "guest", "laugh",
      "put", "strut", "square", "support", "thought", "through"
    ),
    str_replace(Word_Label, alphabet_reg["u"], alphabet_bold["u"]),
    Word_Label
  ),
  Word_Label = ifelse(
    Word == "believe",
    str_replace(
      Word_Label,
      str_c(alphabet_reg["i"], alphabet_reg["e"]),
      str_c(alphabet_bold["i"], alphabet_bold["e"])
    ),
    Word_Label
  ),
  Word_Label = ifelse(
    Word %in% c("goose", "foot"),
    str_replace_all(
      Word_Label,
      str_c(alphabet_reg["o"], alphabet_reg["o"]),
      str_c(alphabet_bold["o"], alphabet_bold["o"])
    ),
    Word_Label
  ),
  Word_Label = ifelse(
    Word == "fleece",
    str_replace(Word_Label, alphabet_reg["e"], alphabet_bold["e"]),
    Word_Label
  )
)
formants$Word_Label %<>% as.factor()
1
Mutating the Word_Label column multiple times, because several words have multiple vowels to swap. Swapping one vowel at a time is shorter than swapping one category of word at time.
2
First modification to Word_Label is all the words where “A” gets bolded.
3
If the value in the Word column is one of these items
4
Then pass the value of the Word_Label column to str_replace(). Replace the “a” from the regular-face set with the “a” from the bold-face set.
5
If the value in the Word column is not any of those words, keep the value of Word_Label the same.
6
Same logic for all the words where “E” gets bolded.
7
Same logic for all the words where “I” gets bolded.
8
Same logic for all the words where “O” gets bolded.
9
Same logic for all the words where “U” gets bolded.
10
There are a couple of exceptions: “believe”, because that’s the word where the second instance of the vowel gets bolded, not the first one. Replace the consecutive “I” and “E” from the regular-face set with the “I” and “E” from the bold-face set.
11
“Goose” is the only word where both O’s need to be bolded.
12
“Fleece” needs the first two, but not the third E bolded.
13
Convert Word_Label from character to factor.

Which renders as:

𝖠𝖬𝗢𝖭𝖦, 𝖠𝖭𝗢𝖳𝖧𝖤𝖱, 𝖡𝖤𝖫𝗜𝗘𝖵𝖤, 𝖡𝖫𝗨𝗘, 𝖡𝗔𝖫𝖫, 𝖡𝗔𝖭𝖦, 𝖡𝗔𝖳𝖧, 𝖡𝗘𝖳, 𝖡𝗘𝗔𝖳, 𝖡𝗜𝖳, 𝖡𝗢𝗨𝖦𝖧𝖳, 𝖡𝗨𝖳, 𝖢𝖫𝗢𝖳𝖧, 𝖢𝖴𝖱𝖤, 𝖢𝗢𝖱𝖤, 𝖢𝗢𝗨𝖫𝖣, 𝖣𝖱𝗘𝖲𝖲, 𝖥𝖫𝗘𝗘𝖢𝖤, 𝖥𝗔𝖬𝖮𝖴𝖲, 𝖥𝗔𝖳𝖧𝖤𝖱, 𝖥𝗜𝖭𝖨𝖲𝖧, 𝖥𝗢𝖱𝖢𝖤, 𝖥𝗢𝗢𝖳, 𝖥𝗨𝖭, 𝖦𝗢𝗔𝖳, 𝖦𝗢𝗢𝖲𝖤, 𝖦𝗨𝗘𝖲𝖳, 𝖧𝗔𝖭𝖣, 𝖧𝗢𝖭𝖮𝖱𝖨𝖥𝖨𝖢, 𝖪𝗜𝖳, 𝖫𝗔𝗨𝖦𝖧, 𝖫𝗢𝖳, 𝖭𝗘𝗔𝖱, 𝖯𝗔𝖫𝖬, 𝖯𝗘𝗢𝖯𝖫𝖤, 𝖯𝗜𝖭, 𝖯𝗨𝖳, 𝖲𝖰𝗨𝗔𝖱𝖤, 𝖲𝖳𝖱𝗨𝖳, 𝖲𝖳𝗔𝖱𝖳, 𝖲𝗔𝖸𝖲, 𝖲𝗨𝖯𝖯𝖮𝖱𝖳, 𝖳𝖧𝖱𝗢𝗨𝖦𝖧, 𝖳𝖧𝗢𝗨𝖦𝖧𝖳, 𝖳𝖱𝗔𝖯, 𝖶𝖧𝗢, 𝖶𝖱𝗢𝖭𝖦

Now we can see which vowels are being plotted more clearly (but with a reminder of how messy English orthography is):

words_ep_2 <- formants %>%
  filter(List == "Lingthusiasm Episodes") %>%
  group_by(Speaker, Vowel, Word_Label) %>%
  summarise(F1 = mean(F1), F2 = mean(F2)) %>%
  ggplot(aes(x = F2, y = F1, color = Vowel, label = Word_Label)) +
  geom_point(size = 1.25) +
  geom_label_repel(min.segment.length = 0, force = 75, seed = 2024, size = 4) +
  facet_wrap(~Speaker, ncol = 1) +
  scale_color_manual(values = carto_pal(12, "Bold")) +
  scale_x_reverse(breaks = c(750, 1250, 1750, 2250, 2750)) +
  scale_y_reverse(breaks = c(250, 500, 750, 1000)) +
  theme_classic() +
  lingthusiasm_theme +
  theme(plot.subtitle = element_markdown(size = 15, family = "sans")) +
  labs(title = "Means By Word: Lingthusiasm Episodes", subtitle = vowel_key) +
  guides(color = guide_none())

words_ep_2
1
Replace Word with Word_Label.
2
Replace Word with Word_Label here too.

Figure 8: Mean for each word in the Lingthusiasm Episode word list.

This was one of the points where I (from the northeast US) realized how I don’t have a lot of experience with Australian accents, because I wasn’t entirely sure how much of the messiness in Lauren’s back vowel data was because her back vowels are in different locations, or because I picked words where she uses a different vowel than Gretchen and I do.

Save the plots:

ggsave(
  means_1, path = "plots", filename = "1_means_original.png",
  width = 8, height = 5, unit = "in", device = png
)
ggsave(
  means_2, path = "plots", filename = "2_means_flipped.png",
  width = 8, height = 5, unit = "in", device = png
)
ggsave(
  points, path = "plots", filename = "3_individual_points.png",
  width = 8, height = 5, unit = "in", device = png
)
ggsave(
  words_ls, path = "plots", filename = "4_words_lexical_set.png",
  width = 8, height = 5, unit = "in", device = png
)
ggsave(
  words_ep_2, path = "plots", filename = "4_words_episodes.png",
  width = 8, height = 8, unit = "in", device = png
)
1
Need to specify device = png (not leave default or device = "png") to get Josefin Sans font to render correctly.
Back to top

Citation

BibTeX citation:
@online{gardner2024,
  author = {Gardner, Bethany},
  title = {Lingthusiasm {Vowel} {Plots}},
  date = {2024-01-30},
  url = {https://bethanyhgardner.github.io/lingthusiasm-vowel-plots},
  langid = {en}
}
For attribution, please cite this work as:
Gardner, Bethany. 2024. “Lingthusiasm Vowel Plots.” January 30, 2024. https://bethanyhgardner.github.io/lingthusiasm-vowel-plots.